Using Regret Estimation to Solve Games Compactly

نویسندگان

  • Dustin Morrill
  • Andrew Bagnell
چکیده

Game theoretic solution concepts, such as Nash equilibrium strategies that are optimal against worst case opponents, provide guidance in finding desirable autonomous agent behaviour. In particular, we wish to approximate solutions to complex, dynamic tasks, such as negotiation or bidding in auctions. Computational game theory investigates effective methods for computing such strategies. Solving human-scale games, however, is currently an intractable problem. Counterfactual Regret Minimization (CFR) [43], is a regret-minimizing, online learning algorithm that dominates the Annual Computer Poker Competition (ACPC) and lends itself readily to various sampling and abstraction techniques. Abstract games are created to mirror the strategic elements of an original game in a more compact representation. The abstract game can be solved and the abstract game solution can be translated back into the full game. But crafting an abstract game requires domain-specific knowledge, and an abstraction can interact with the game solving process in unintuitive and harmful ways. For example, abstracting a game can create pathologies where solutions to more granular abstractions can be more exploitable against a worst-case opponent in the full game than those derived from simpler abstractions [42]. An abstraction that could be dynamically changed and informed by the solution process could potentially produce better solutions more consistently. We suggest that such abstractions can be largely subsumed by a regressor on game features that estimates regret during CFR. Replacing abstraction with a regressor allows the memory required to approximate a solution to a game to be proportional to the complexity of the regressor rather than the size of the game itself. Furthermore, the regressor essentially becomes a tunable, compact, and dynamic abstraction of

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Collocation Method using Compactly Supported Radial Basis Function for Solving Volterra's Population Model

‎In this paper‎, ‎indirect collocation approach based on compactly supported radial basis function (CSRBF) is applied for solving Volterra's population model. The method reduces the solution of this problem to the solution of a system of algebraic equations‎. ‎Volterra's model is a non-linear integro-differential equation where the integral term represents the effect of toxin‎. ‎To solve the pr...

متن کامل

Regret Minimization in Games with Incomplete Information

Extensive games are a powerful model of multiagent decision-making scenarioswith incomplete information. Finding a Nash equilibrium for very large instancesof these games has received a great deal of recent attention. In this paper, wedescribe a new technique for solving large games based on regret minimization.In particular, we introduce the notion of counterfactual regret, whi...

متن کامل

MCRNR: Fast Computing of Restricted Nash Responses by Means of Sampling

This paper presents a sample-based algorithm for the computation of restricted Nash strategies in complex extensive form games. Recent work indicates that regret-minimization algorithms using selective sampling, such as Monte-Carlo Counterfactual Regret Minimization (MCCFR), converge faster to Nash equilibrium (NE) strategies than their non-sampled counterparts which perform a full tree travers...

متن کامل

Regret Minimization in Games with Incomplete Information

Extensive games are a powerful model of multiagent decision-making scenarioswith incomplete information. Finding a Nash equilibrium for very large instancesof these games has received a great deal of recent attention. In this paper, wedescribe a new technique for solving large games based on regret minimization.In particular, we introduce the notion of counterfactual regret, whi...

متن کامل

Slumbot NL: Solving Large Games with Counterfactual Regret Minimization Using Sampling and Distributed Processing

Slumbot NL is a heads-up no-limit hold’em poker bot built with a distributed disk-based implementation of counterfactual regret minimization (CFR). Our implementation enables us to solve a large abstraction on commodity hardware in a cost-effective fashion. A variant of the Public Chance Sampling (PCS) version of CFR is employed which works particularly well with

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016